A Korean Information Retrieval Model Alleviating Syntactic Term Mismatches
نویسندگان
چکیده
In Korean information retrieval, term mismatches between indexing terms and query terms are a serious obstacle to the enhancement of retrieval performace. Term matches are not produced because of a space usage of compound nouns and also various representations of a phrase. This paper presents an extended model of Korean information retrieval of alleviating these term mismatches between indexing terms and query terms. In this model, we segments compound nouns into unit nouns by using statistical information and a preference rule. And then we synthesize unit nouns or single nouns into synthesized compound nouns by adjacency constriants. Among all candidates of synthesized compound nouns, we lter out meaningless compound nouns by mutual information and the relative frequency of category pairs. Moreover, we perform similarity computation considering partial matching for compound nouns. The experimental results show that the proposed method can overcome the diierence between surface forms of terms.
منابع مشابه
University of Glasgow at CLEF 2013: Experiments in eHealth Task 3 with Terrier
In our participation in the CLEF 2013 eHealth task 3, we investigate (1) the effectiveness of our Divergence from Randomness (DFR) framework on retrieving medical webpages, (2) the adoption of classical pseudo-relevance feedback for improving the representation of the queries, and (3) the exploitation of a collection enrichment technique for alleviating the mismatches between the terms in docum...
متن کاملUsing syntactic information in handling natural language queries forextended boolean retrieval model
There are considerable evidences that trained users can achieve a good search eeectiveness through structured boolean queries rather than simple keyword queries because boolean operators can help to make more accurate representations of users' information search needs. However, it is not normally easy for ordinary users to construct eeective boolean queries using appropriate boolean operators. ...
متن کاملQuasi-Synchronous Dependence Model for Information Retrieval
Incorporating syntactic features in a retrieval model has had very limited success in the past, with the exception of term dependencies. This paper presents a new term dependency modeling approach based on a dependency parsing technique used for both queries and documents. Our model is inspired by a quasi-synchronous stochastic process for machine translation [21]. It describes four different t...
متن کاملTwo-Level Alignment by Words and Phrases Based on Syntactic Information
As a part of work on alignment of the English and Korean parallel corpus, this paper presents a statistical translation model incorporating linguistic knowledge of syntactic and phrasal information for better translations. For this, we propose three models: First, we incorporate syntactic information such as part of speech into the word-based lexical alignment. Based on this model, we propose t...
متن کاملApplying Multiple Characteristics and Techniques in the NICT Information Retrieval System at NTCIR-6
Our information retrieval system takes advantage of numerous characteristics of information and uses numerous sophisticated techniques. It uses Robertson’s 2-Poisson model and Rocchio’s formula, both of which are known to be effective. Characteristics of newspapers such as locational information are used. We present our application of Fujita’s method, where longer terms are used in retrieval by...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997